Back

Journal of Vision

Association for Research in Vision and Ophthalmology (ARVO)

All preprints, ranked by how well they match Journal of Vision's content profile, based on 92 papers previously published here. The average preprint has a 0.03% match score for this journal, so anything above that is already an above-average fit. Older preprints may already have been published elsewhere.

1
EasyEyes - Accurate fixation for online vision testing of crowding and beyond

Kurzawski, J. W.; Pombo, M.; Burchell, A.; Hanning, N. M.; Liao, S.; Majaj, N. J.; Pelli, D.

2023-07-18 neuroscience 10.1101/2023.07.14.549019 medRxiv
Top 0.1%
70.3%
Show abstract

Online methods allow testing of larger, more diverse populations, with much less effort than in-lab testing. However, many psychophysical measurements, including visual crowding, require accurate eye fixation, which is classically achieved by testing only experienced observers who have learned to fixate reliably, or by using a gaze tracker to restrict testing to moments when fixation is accurate. Alas, both approaches are impractical online since online observers tend to be inexperienced, and online gaze tracking, using the built-in webcam, has a low precision ({+/-}4 deg, Papoutsaki et al., 2016). The EasyEyes open-source software reliably measures peripheral thresholds online with accurate fixation achieved in a novel way, without gaze tracking. EasyEyes tells observers to use the cursor to track a moving crosshair. At a random time during successful tracking, a brief target is presented in the periphery. The observer responds by identifying the target. To evaluate EasyEyes fixation accuracy and thresholds, we tested 12 naive observers in three ways in a counterbalanced order: first, in the lab, using gaze-contingent stimulus presentation (Kurzawski et al., 2023; Pelli et al., 2016); second, in the lab, using EasyEyes while independently monitoring gaze; third, online at home, using EasyEyes. We find that crowding thresholds are consistent (no significant differences in mean and variance of thresholds across ways) and individual differences are conserved. The small root mean square (RMS) fixation error (0.6 deg) during target presentation eliminates the need for gaze tracking. Thus, EasyEyes enables fixation-dependent measurements online, for easy testing of larger and more diverse populations.

2
A deep convolutional neural network trained for lightness constancy is susceptible to lightness illusions

Patel, J.; Flachot, A.; Vazquez-Corral, J.; Brainard, D. H.; Wallis, T. S. A.; Brubaker, M. A.; Murray, R. F.

2025-11-12 neuroscience 10.1101/2025.11.10.687742 medRxiv
Top 0.1%
64.7%
Show abstract

Human viewers are able to perform tasks that depend on accurate estimates of surface reflectance, even across large changes in illumination and context. This is a remarkable ability, and successful image-computable models of how the visual system achieves this have remained elusive. Recently, deep convolutional neural networks (CNNs) have been developed that are adept at estimating surface reflectance. Here we evaluated one such network as a starting point for a new model of human lightness perception by testing whether it was susceptible to a range of classic lightness illusions. We implemented a CNN and trained it via supervised learning to estimate surface reflectance at each pixel in grayscale, rendered images of geometric objects. We examined the networks output on several illusions, including the argyle, Koffka, snake, simultaneous contrast, Whites, and checkerboard illusions, as well as control figures. We included variants where low-luminance regions important to the illusions were generated either by low reflectance or by cast shadows. For comparison, we carried out a lightness matching experiment with human observers using the same stimuli, and also examined the outputs of three classic lightness and brightness models. The CNN largely removed lighting effects such as shading and shadows, and produced good reflectance estimates on a test set. It also qualitatively predicted the illusions perceived by humans in most cases, the exceptions being Whites and checkerboard illusions. The CNN outperformed classical models, both at estimating reflectance and at tracking human lightness matches. These findings support a normative view of lightness perception and highlight the promise of deep learning models in this area.

3
Deep neural networks trained for estimating albedo and illumination achieve lightness constancy differently than human observers.

Flachot, A.; Patel, J.; Wallis, T. S. A.; Brubaker, M. A.; Brainard, D. F.; Murray, R. F.

2025-07-15 neuroscience 10.1101/2025.07.10.664065 medRxiv
Top 0.1%
53.7%
Show abstract

Lightness constancy, the ability to create perceptual representations that are strongly correlated with surface albedo despite variations in lighting and context, is a challenging computational problem. Indeed, it has proven difficult to develop image-computable models of how human vision achieves a substantial degree of lightness constancy in complex scenes. Recently, convolutional neural networks (CNNs) have been developed that are proficient at estimating albedo, but little is known about how they achieve this, or whether they are good models of human vision. We examined this question by training a CNN to estimate albedo and illumination in a computer-rendered virtual world, and evaluating both the CNN and human observers in a lightness matching task. In several conditions, we eliminated cues potentially supporting lightness constancy: local contrast, shading, shadows, and all contextual cues. We found that the network achieved a high degree of lightness constancy, outperforming three classic models, and substantially outperforming human observers as well. However, we also found that eliminating cues affected the CNN and humans very differently. Humans had much worse constancy when local contrast cues were made uninformative, but were minimally affected by elimination of shading or shadows. The CNN was unaffected by local contrast, but relied on shading and shadows. These results suggest that the CNN followed an effective strategy of integrating global image cues, whereas humans used a more local strategy. In a follow-up experiment, we found that the CNN could learn to exploit noise artifacts that were correlated with illuminance in ray-traced scenes, whereas humans did not. We conclude that CNNs can learn an effective, global strategy of estimating lightness, which is closer to an optimal strategy for the ensemble of scenes we studied than the computation used by human vision.

4
A bias in transsaccadic perception of spatial frequency changes

Sharvashidze, N.; Hübner, C.; Schütz, A. C.

2024-03-13 neuroscience 10.1101/2024.03.11.584439 medRxiv
Top 0.1%
52.6%
Show abstract

Visual processing differs between the foveal and the peripheral visual field. These differences can lead to different appearances of objects in the periphery and the fovea, which poses a challenge to perception across saccades. Differences in the appearance of visual features between the peripheral and foveal visual field may bias change discrimination across saccades. Previously it has been reported that spatial frequency (SF) appears higher in the periphery compared to the fovea (Davis et al., 1987). In this study, we investigated the visual appearance of SF before and after a saccade and the discrimination of SF changes implemented during saccades. In addition, we tested the contributions of pre- and postsaccadic information to change discrimination performance. In the first experiment, we found no differences in the appearance of SF before and after a saccade. However, participants showed a clear bias to report SF increases. Interestingly, a 200-ms postsaccadic blank period improved the precision of the responses but did not affect the bias. In the second experiment, participants showed lower thresholds for SF increases than for decreases, suggesting that the bias in the first experiment was not just a response bias. Finally, we asked participants to discriminate the SF of stimuli presented before a saccade. Thresholds in the presaccadic discrimination task were lower than thresholds in the change discrimination task, suggesting that transsaccadic change discrimination is not merely limited by presaccadic discrimination in the periphery. The change direction bias might stem from more effective masking or overwriting of the presaccadic stimulus by the postsaccadic low SF stimulus. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=119 SRC="FIGDIR/small/584439v1_ufig1.gif" ALT="Figure 1"> View larger version (23K): org.highwire.dtl.DTLVardef@12d3f68org.highwire.dtl.DTLVardef@19df18eorg.highwire.dtl.DTLVardef@5dd09forg.highwire.dtl.DTLVardef@12ba71f_HPS_FORMAT_FIGEXP M_FIG C_FIG

5
Visual confidence accurately tracks increasing internal noise with eccentricity in peripheral vision

Li, L.; Landy, M. S.

2026-02-01 neuroscience 10.64898/2026.01.28.702447 medRxiv
Top 0.1%
51.0%
Show abstract

Sensory representations are inherently noisy, and monitoring this noise is essential for effective decision-making. This metacognitive ability of evaluating the quality of ones perceptual decision is referred to as perceptual confidence. However, whether perceptual confidence accurately tracks internal noise remains unresolved. Peripheral vision provides a natural testing ground for this question, yet previous studies report mixed results complicated by different definitions and measurements of confidence. Here, we used a normative Bayesian framework with incentivized confidence measurements to address these discrepancies. We tested the Bayesian-confidence hypothesis that confidence is derived from the posterior probability distribution of the feature being judged, given noisy sensory measurements. We tested two perceptual tasks while varying stimulus eccentricity: spatial localization and orientation estimation. We measured confidence by post-decision wagering, by which participants set a symmetrical range around the perceptual estimates. Participants earned higher reward for narrower confidence ranges but received zero reward if the range did not enclose the target. We estimated sensory noise from the perceptual responses to predict confidence, assuming that sensory noise linearly increases with eccentricity. We then compared a normative Bayesian model with three alternative models that challenged different assumptions. Across both tasks, the Bayesian ideal-observer model best predicted confidence. These results suggest that humans can accurately monitor the increased internal noise in peripheral vision and use this information to make optimal confidence judgments.

6
Head and eye movement planning differ in access to information during visual search

Durant, S.; Watson, T.

2022-07-26 neuroscience 10.1101/2022.05.30.493999 medRxiv
Top 0.1%
46.2%
Show abstract

To characterize the process of visual search, reaction time is measured relative to stimulus onset, when the whole search field is presented in view simultaneously. Salient objects are found faster, suggesting that they are detected using peripheral vision (rather than each object being fixated in turn). This work investigated how objects are detected in the periphery when onset in the visual field is due to head movement. Is the process of target detection similarly affected by salience? We test this in 360 degree view with free head and eye movement, using a virtual reality headset with eye tracking. We presented letters and Gabor patches as stimuli in separate experiments. Four clusters were arranged horizontally such that two clusters were visible at onset either side of a fixation cross (near location) while the other two entered the field of view (FoV) when the participant made an appropriate head movement (far location). In both experiments we varied whether the target was less or more salient. We found an interesting discrepancy in that across both tasks and locations the first eye movement to land near a cluster was closer to the salient target, even though salience did not lead to a faster head movement towards a cluster at the far locations. We also found that the planning of head movement changed the landing of gaze position to be targeted more towards the centres of the clusters at the far locations, leading to more accurate initial gaze positions relative to target, regardless of salience. This suggests that the spatial information available for targeting of eye movements within a given FoV is not always available for the planning of head movements and how a target appears in view affects gaze targeting accuracy.

7
Given the birds, where is the flock? Visual estimation of the location of collections of points

Ota, K.; Wu, Q.; Mamassian, P.; Maloney, L.

2025-07-16 neuroscience 10.1101/2025.07.10.664170 medRxiv
Top 0.1%
44.7%
Show abstract

A key step in perceptual organization is segmentation of a scene into wholes made of parts: birds form flocks, pedestrians form crowds. The parts have spatial locations as does the whole and we can ask, how do the locations of the parts influence the perceived location of the whole? The answer may depend on the nature of the parts, the processes that generate them: the influence of single birds on the location of a flock may be different from that of single pedestrians on the location of a crowd. We treated the parts as samples from a probability density function (PDF) and asked participants to estimate the location of the generating PDF given a sample. The generating PDFs belonged to one of three location families - Gaussian, Laplacian or Uniform. Observers received training with each family and knew which family the generating PDF came from on each trial. We compared human performance to that of the Uniform Minimum Variance Unbiased Estimator (UMVUE) for each location family. We based our analyses on the measured influence of each sample point, a measure of how the observer made use of each sample point in estimating the center. Observers used different estimators for different families and the estimator chosen for samples from each family were close to the UMVUE for that family. How does the observer calculate approximate UMVUEs for the three location families considered? We propose that a critical step is to estimate the locations of visual clusters in the sample. The Visual Cluster Model accurately captured human performance across all three distributions. Our findings suggest that the observers estimate of the location of the whole is based not directly on the locations of the parts but rather on the clusters they form.

8
When the brightest is not the best: illuminant estimation from the geometry of specular highlights

Morimoto, T.; Lee, R. J.; Smithson, H. E.

2026-01-24 neuroscience 10.64898/2026.01.22.700600 medRxiv
Top 0.1%
44.5%
Show abstract

Color constancy allows us to perceive stable object colors under different lighting conditions by reducing the impact of lighting. Information about illuminant color could be derived from a white surface or a specular highlight. The "brightest is white" heuristic has been frequently incorporated in illumination estimation models, to identify illuminant color. Here, we tested an alternative hypothesis: we use structured changes in the proximal image to identify highlight regions, even when they are not the brightest elements in the scene. In computer-rendered scenes, we varied the reliability of "brightest element" and "highlight geometry" cues, testing their effect on a color constancy task. Each scene had a single spherical surface lit by several point lights with identical spectral properties. The surface had a uniform spectral reflectance but a noise texture that attenuated the reflectance by a variable scale factor. We tested three levels of specularity: zero (matte), low, and mid. Observers watched a 1.5-second animation and responded if color changes were due to illuminant or material changes. Discrimination performance for matte surfaces was nearly at chance level, as predicted. However, as specularity increased, performance improved significantly. Observers outperformed an ideal observer model who relied solely on the brightest element. Notably, when the specular region appeared on a dark part of the texture, observer performance improved even more--even though the brightest element heuristic would predict a decrease. When specular geometries were difficult to identify due to phase scrambling, observer performance significantly dropped. These results suggest that we do not simply rely on the brightest element, but rather utilize regularities of diffuse and specular components of the proximal image to solve surface and illuminant ambiguities.

9
Optimal colors can predict luminosity thresholds in natural scenes

Duay, K.; Nagai, T.

2025-08-11 neuroscience 10.1101/2025.08.07.669217 medRxiv
Top 0.1%
44.4%
Show abstract

Luminosity thresholds define the luminance boundary at which a surface color shifts in appearance from being perceived as an illuminated surface to appearing self-luminous. Previous research suggests that the human visual system infers these thresholds based on internal references of physically realizable surface colors under a given illumination, referred to as the physical gamut. A surface is perceived as self-luminous when its luminance exceeds the upper limit of this empirically internalized gamut. However, the precise structure and boundaries of these gamuts remain uncertain. Optimal colors, which represent theoretical surface reflectances under specific illuminants, have been shown to provide an effective model for visualizing and computing the physical gamuts. In prior studies, optimal colors have successfully predicted luminosity thresholds; however, these findings were limited to highly simplified, abstract stimuli. Whether this framework generalizes to more naturalistic viewing conditions has remained an open question. In the present study, we demonstrate that the theory of an internal reference in the form of an empirically constructed physical gamut, visualized through optimal colors, remains valid under more natural conditions. Our results confirm that optimal colors can still accurately predict luminosity thresholds in such settings. Moreover, our findings suggest that the luminosity thresholds encompass both self-luminosity and naturalness concepts. Subsequently, this may imply that the notion of physical gamut could envelope both concepts as well and could be defined as "all physically possible colors in a scene for an object that does not emit light." These insights can have profound potential implications for both applied fields (i.e., XR or projection mapping) and fundamental science (e.g., understanding human visual processing mechanisms).

10
Connecting the dots - Recognition of artificial and natural shapes relies on representing points of high information

Schmidtmann, G.; Baker, N.; Lande, K. J.; Schmidt, F.

2025-11-17 neuroscience 10.1101/2025.11.17.688832 medRxiv
Top 0.1%
39.8%
Show abstract

Physiological and psychophysical evidence suggests that the visual system represents object outlines based on prominent curvature features, in particular regions of extreme curvature (such as convex maxima and concave minima). Curvature extrema often coincide with points of high information content ("surprisal," in information-theoretic terms). However, this relationship is only correlational. To date, no study has directly compared the role of curvature extrema with the role of surprisal itself. Does the visual system selectively encode curvature extrema because they tend to be informative--because they are heuristic proxies for high-surprisal points along the contour--or does it directly encode informative points that happen often to be located at curvature extrema? We addressed this question in a series of shape-matching experiments, testing how curvature extrema and information content contribute to recognition. Observers (N = 7) matched a smooth test shape to one of two re-scaled shapes (target and distractor) constructed by connecting, with straight lines, points corresponding to (i) curvature maxima, (ii) both curvature maxima and minima, or (iii) points of maximum surprisal. A baseline condition used identical test and target shapes. Stimuli included artificial shapes composed of compound radial frequency patterns and natural shapes (animal outlines), the latter enabling us to disentangle curvature and information effects by restricting sampled points. Recognition performance was higher for natural than artificial shapes (95% vs. [~]86%). Performance for shapes containing a few points of high information matched performance on trials containing all curvature extrema and baseline trials. It also exceeded performance for shapes with curvature maxima alone (65% vs. [~]90%). These findings suggest that shape representation emphasizes features with high informational content rather than curvature extrema per se.

11
Quantifying and predicting chromatic thresholds

Zhou, J.

2023-06-11 neuroscience 10.1101/2023.06.06.543898 medRxiv
Top 0.1%
39.8%
Show abstract

Perceptual thresholds measured in the two-dimensional chromatic diagram are elliptical in shape. Across different parts of the chromatic diagram, these ellipses vary in their sizes, their tilting angles, and in how much they elongate. Overall, the chromatic thresholds exhibit intriguing patterns that were reflected in McAdams measurements in 1942. Previously, da Fonseca and Samengo (2016) used a neural model combined with Fisher information (a quantification of perceptual thresholds) to predict the pattern of chromatic thresholds measured in human observers. The model assumes linear cone responses paired with Poisson noise. I furthered the analysis, and studied two additional aspects of chromatic perception. First, I quantified how the pattern of chromatic thresholds vary when the proportion of three cone types (short-, mid-, and long-wavelength) varies. This analysis potentially leads to efficient estimation of thresholds across the chromatic diagram. Second, I analyzed to what extent the assumption of Poisson noise contributes to the threshold predictions. Surprisingly, eliminating Poisson noise betters the model prediction. So in addition to Poisson noise, I examined three alternative noise assumptions, and achieved improved predictions to MacAdams data. At last, I examined an application using the improved model-predictions. The total number of cones, as well as the proportion of S cone vary across retinal eccentricities. I showed that these two variations predict chromatic threshold patterns across retinal eccentricities are drastically different.

12
The amblyopic acuity deficit: impact on the identification of letters distorted by spatial scrambling algorithms

Zhu, R. X.; Hess, R. F.; Baldwin, A. S.

2025-07-07 neuroscience 10.1101/2025.07.02.662570 medRxiv
Top 0.1%
39.1%
Show abstract

The letter acuity impairment in the amblyopic eye often exceeds predictions made from the cut-off spatial frequency for grating detection. Spatial scrambling in the amblyopic eyes projections to the visual cortex has been proposed to bear some responsibility for this additional deficit. Using a novel stimulus algorithm that creates spatially scrambled bandpass letters, we generated stimuli simulating either: i) "cortical scrambling" at the output of oriented model "simple cells", or ii) "subcortical scrambling" of isotropic subunits that combine to form these simple cells. We also investigated a more conventional "noise masking" with bandpass noise. We performed two bandpass letter identification experiments, equating the stimuli shown to each eye by normalising either: i) their contrast, presenting them at four times their contrast detection threshold; or ii) their spatial scale, presenting them at twice the participants acuity threshold for each eye. At the group level, we found that the amblyopic eye is less efficient at performing letter identification in bandpass noise. We did not find an overall significant difference with either scrambling type when comparing efficiency between the amblyopic and fellow eye, but we did find such a difference when partitioning our participants by their stereopsis ability. In further analyses of the pattern of mistakes, we found the amblyopic eye shows a distinctive behaviour which correlates with the acuity deficit for both types of scrambling. These results demonstrate that our scrambled stimuli interrogate a component of amblyopic vision that is functionally distinct from that addressed by contrast noise masking.

13
Vergence accuracy in an autostereoscopic display

Lo Verde, L.; Norcia, A. M.

2021-07-30 neuroscience 10.1101/2021.07.29.454355 medRxiv
Top 0.1%
38.8%
Show abstract

When fixating an object, observers typically under or over-converge by a small amount, a phenomenon known as "fixation disparity". Fixation disparity is typically measured with physical fixation targets and dichotically presented nonius lines. Here we made fixation disparity measurements with an autostereoscopic display, varying the retinal eccentricity and disparity of the fixation targets. Measurements were made in a group of four practiced observers and in a group of thirteen experimentally naive observers. Fixation disparities with a zero-disparity target were in the direction of fixation behind the plane of the screen and the magnitude of the fixation disparity grew with the eccentricity of the fixation targets (1-5 deg in the practiced observers and 1 - 10 deg in the naive observers). Fixation disparity also increased with increasing disparity of the targets, especially when they were presented at crossed disparities. Fixation disparities were larger overall for naive observers who additionally did not converge in front of the screen when vergence demand was created by crossed disparity fusion locks presented at 5 and 10 deg eccentricities.

14
The influence of similarity, sensitivity and bias on letter identification

Barhoom, H.; Joshi, M. R.; Schmidtmann, G.

2025-04-24 neuroscience 10.1101/2025.04.20.649714 medRxiv
Top 0.1%
38.5%
Show abstract

Previous studies have demonstrated that bias, sensitivity and similarity between letters are causes of errors in letter identification. However, these factors and their relative contribution in letter identification have not been investigated extensively. Our previous model (noisy template model) was devised to calculate the effect of bias and sensitivity in letter identification task. In the current study, we used the method of constant stimuli to measure letter acuity for Sloan letters at an eccentricity of 7 deg from fixation (temporal visual field). Similar to our previous work, we devised an tested a variety of models to estimate the joint role of bias and sensitivity, but extended our model to also incorporate the similarity between letters. Modelling results showed that bias is the major factor in determining the pattern of total, correct and incorrect responses in letter identification. Furthermore, the joint effect of similarity and bias was found to be higher than the joint effect of either bias and sensitivity or similarity and sensitivity in shaping the pattern of overall responses in letter identification. Incorporating the similarity factor to the noisy template model improved our understanding of the simultaneous contribution of the bias, sensitivity and similarity between letters in the letter identification task.

15
Surround Suppression of Broadband Images

Pokorny, V. J.; Weldon, K. B.; Olman, C. A.

2024-05-15 neuroscience 10.1101/2024.05.15.594329 medRxiv
Top 0.1%
38.5%
Show abstract

Visual perception is profoundly sensitive to context. Surround suppression is a well-known visual context effect in which the firing rate of a neuron is suppressed by stimulation of its extra-classical receptive field. The majority of contrast surround suppression studies exclusively use narrowband, sinusoidal grating stimuli; however, it is unclear whether the results produced by such artificial stimuli generalize to real-world, naturalistic visual experiences. To address this issue, we developed a contrast discrimination paradigm that includes both naturalistic broadband textures and narrowband grating textures. All textures were matched for first order image statistics and overall perceptual salience. We observed surround suppression across broadband textures (F(1,6)=19.01, p=.005); however, effect sizes were largest for narrowband, sinusoidal gratings (Cohens d=1.83). Among the three broadband texture types, we observed strongest suppression for the texture with a clear dominant orientation (stratified: Cohens d=1.29), while the textures with a more even distribution of orientation information produced weaker suppression (fibrous: Cohens d=0.63; braided: Cohens d=0.65). We also observed an effect of texture identity on the slope of psychometric functions (F(1.98,11.9)=7.29, p=0.01), primarily driven by smaller slopes for the texture with the most uniform distribution of orientations. Our results suggest that well-known contextual modulation effects only partially generalize to more ecologically valid stimuli.

16
Natural image statistics at depth edges modulate perceptual stability

Basgoze, Z.; White, D. N.; Burge, J.; Cooper, E. A.

2020-04-06 neuroscience 10.1101/2020.04.05.026724 medRxiv
Top 0.1%
38.2%
Show abstract

Binocular fusion relies on matching points in the two eyes that correspond to the same physical feature in the world. However, not all world features are binocularly visible. In particular, at depth edges parts of a scene are often visible to only one eye (so-called half occlusions). Accurate detection of these monocularly visible regions is likely to be important for stable visual perception. If monocular regions are not detected as such, the visual system may attempt to binocularly fuse non-corresponding points, which can result in unstable percepts. We investigated the hypothesis that the visual system capitalizes upon statistical regularities associated with depth edges in natural scenes to aid binocular fusion and facilitate perceptual stability. By sampling from a large set of stereoscopic natural image patches, we found evidence that monocularly visible regions near depth edges in natural scenes tend to have features more visually similar to the adjacent binocularly visible background region than to the adjacent binocularly visible foreground. The generality of these results was supported by a parametric study of three-dimensional (3D) viewing geometry in simulated environments. In two perceptual experiments, we examined if this statistical regularity may be leveraged by the visual system. The results show that perception tended to be more stable when the visual properties of the depth edge were statistically more likely. Exploiting regularities in natural environments may allow the visual system to facilitate fusion and perceptual stability of natural scenes when both binocular and monocular regions are visible. PrecisWe report an analysis of natural scenes and two perceptual studies aimed at understanding how the visual statistics of depth edges impact perceptual stability. Our results suggest that the visual system exploits natural scene regularities to aid binocular fusion and facilitate perceptual stability.

17
Second-order boundaries segment more easily when they are density-defined rather than feature-defined

DiMattina, C.

2023-07-11 neuroscience 10.1101/2023.07.10.548431 medRxiv
Top 0.1%
37.7%
Show abstract

Previous studies have demonstrated that density is an important perceptual aspect of textural appearance to which the visual system is highly attuned. Furthermore, it is known that density cues not only influence texture segmentation, but can enable segmentation by themselves, in the absence of other cues. A popular computational model of texture segmentation known as the "Filter-Rectify-Filter" (FRF) model predicts that density should be a second-order cue enabling segmentation. For a compound texture boundary defined by superimposing two single-micropattern density boundaries, a version of the FRF model in which different micropattern-specific channels are analyzed separately by different second-stage filters makes the prediction that segmentation thresholds should be identical in two cases: (1) Compound boundaries with an equal number of micropatterns on each side but different relative proportions of each variety (compound feature boundaries) and (2) Compound boundaries with different numbers of micropatterns on each side, but with each side having an identical number of each variety (compound density boundaries). We directly tested this prediction by comparing segmentation thresholds for second-order compound feature and density boundaries, comprised of two superimposed single-micropattern density boundaries comprised of complementary micropattern pairs differing either in orientation or contrast polarity. In both cases, we observed lower segmentation thresholds for compound density boundaries than compound feature boundaries, with identical results when the compound density boundaries were equated for RMS contrast. In a second experiment, we considered how two varieties of micropatterns summate for compound boundary segmentation. In the case where two single micro-pattern density boundaries are superimposed to form a compound density boundary, we find that the two channels combine via probability summation. By contrast, when they are superimposed to form a compound feature boundary, segmentation performance is worse than for either channel alone. From these findings, we conclude that density segmentation may rely on neural mechanisms different from those which underlie feature segmentation, consistent with recent findings suggesting that density comprises a separate psychophysical channel.

18
Distance v.s. Resolution: Neuromapping of Effective Resolution onto Physical Distance

Arslan, S. S.; Fux, M.; Sinha, P.

2023-08-06 neuroscience 10.1101/2023.08.03.551725 medRxiv
Top 0.1%
37.7%
Show abstract

The main focus of this work is on determining the effective resolution of a face image on the retina when the face is at a particular distance from the eye. Despite its straightforward articulation, arriving at a satisfactory solution might be unexpectedly challenging. The relationship between viewing distance and effective resolution cannot be easily obtained through Snellen acuity, contrast sensitivity, photoreceptor packing density or ganglion cell convergence rate in the retina. We used theoretical modelling to establish preliminary guidelines and then tested them empirically. We showed participants a 2 x 2 array of images in different resolutions at various viewing distances. At each distance, participants were expected to perform an "odd one out" task, identifying the image that was blurrier than the other three. As the study progressed, viewing distance was gradually reduced. The data collected enabled us to determine the upper and lower limits of the available effective resolution for human vision under normal conditions, as a function of viewing distance. Notably, human performance in blur detection is superior to what a theoretical model based on projected image size, cone density, and foveal extent predicts, especially at close ranges. Therefore, we propose that future theoretical models must account for non-uniform in-fovea density and the less pronounced decline in acuity outside the fovea to establish a reliable relationship between viewing distance and perceived image characteristics. The <distance:effective-resolution> mapping allows for a direct comparison of human face recognition performance across different levels of blur and viewing distance. It also enables us to systematically compare human performance to that of machine vision systems using resolution as a common factor.

19
Foveated metamers of the early visual system

Broderick, W. F.; Rufo, G.; Winawer, J.; Simoncelli, E.

2023-05-22 neuroscience 10.1101/2023.05.18.541306 medRxiv
Top 0.1%
37.7%
Show abstract

The ability of humans to discriminate and identify spatial patterns varies across the visual field, and is generally worse in the periphery than in the fovea. This decline in performance is revealed in many kinds of tasks, from detection to recognition. A parsimonious hypothesis is that the representation of any visual feature is blurred (spatially averaged) by an amount that differs for each feature, but that in all cases increases with eccentricity. Here, we examine models for two such features: local luminance and spectral energy. Each model averages the corresponding feature in pooling windows whose diameters scale linearly with eccentricity. We performed perceptual experiments with synthetic stimuli to determine the largest window scaling for which human and model discrimination abilities match (the "critical" scaling). We used much larger stimuli than those of previous studies, subtending 53.6 by 42.2 degrees of visual angle. We found that the critical scaling for the luminance model was approximately one-fourth that of the energy model and, consistent with earlier studies, that the estimated critical scaling value was smaller when discriminating a synthesized stimulus from a natural image than when discriminating two synthesized stimuli. Moreover, we found that initializing the generation of the synthesized images with natural images reduced the critical scaling value when discriminating two synthesized stimuli, but not when discriminating a synthesized from a natural image stimulus. Together, the results show that critical scaling is strongly affected by the image statistic (pooled luminance vs. spectral energy), the comparison type (synthesized vs. synthesized or synthesized vs. natural), and the initialization image for synthesis (white noise vs natural image). We offer a coherent explanation for these results in terms of alignments and misalignments of the models with human perceptual representations.

20
Segmenting luminance-defined texture boundaries

DiMattina, C.; Baker, C. L.

2020-11-09 neuroscience 10.1101/2020.06.27.175505 medRxiv
Top 0.1%
34.2%
Show abstract

Segmenting scenes into distinct surfaces is a basic visual perception task, and luminance differences between adjacent surfaces often provide an important segmentation cue. However, mean luminance differences between two surfaces may exist without any sharp change in albedo at their boundary, but rather from differences in the proportion of small light and dark areas within each surface, e.g. texture elements, which we refer to as a luminance texture boundary. Here we investigate the performance of human observers segmenting luminance texture boundaries. We demonstrate that a simple model involving a single stage of filtering cannot explain observer performance, unless it incorporates contrast normalization. Performing additional experiments in which observers segment luminance texture boundaries while ignoring super-imposed luminance step boundaries, we demonstrate that the one-stage model, even with contrast normalization, cannot explain performance. We then present a Filter-Rectify-Filter (FRF) model positing two cascaded stages of filtering, which fits our data well, and explains observers ability to segment luminance texture boundary stimuli in the presence of interfering luminance step boundaries. We propose that such computations may be useful for boundary segmentation in natural scenes, where shadows often give rise to luminance step edges which do not correspond to surface boundaries.